Online Value Function Improvement

نویسندگان

  • Mitchell Keith Bloch
  • John Edwin Laird
چکیده

Our goal is to develop broadly competent agents that can dynamically construct an appropriate value function for tasks with large state spaces so that they can effectively and efficiently learn using reinforcement learning. We study the case where an agent’s state is determined by a small number of continuous dimensions, so that the problem of determining the relevant features corresponds roughly to that of determining the appropriate level of discretization of the continuous values. We adopt hierarchical tile coding, which applies state aggregation at multiple levels of state abstraction simultaneously. Using our formulation, it is possible to capture the advantages of learning with state abstractions ranging from general to specific using linear function approximation. We then develop a novel algorithm for incrementally refining the degree of state abstraction, based on cumulative absolute temporal difference error, which produces a sparse nonuniform tile coding. We empirically evaluate our approach in the Puddle World and Mountain Car environments. The results demonstrate that the static and incremental hierarchical tile codings significantly outperform individual tilings and multilevel tile codings (CMACs) for initial learning. Our results also indicate that the incrementally constructed tilings perform nearly as well as the full hierarchical tile coding while requiring an order of magnitude fewer weights.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Neural-Smith Predictor Method for Improvement of Networked Control Systems

Networked control systems (NCSs) are distributed control systems in which the nodes, including controllers, sensors, actuators, and plants are connected by a digital communication network such as the Internet. One of the most critical challenges in networked control systems is the stochastic time delay of arriving data packets in the communication network among the nodes. Using the Smith predic...

متن کامل

اثربخشی ارتز پا بر شدت درد زانو و عملکرد ورزشکاران خانم مبتلا به سندرم درد پاتلو فمورال

Bacground and Objective: Foot orthoses are a common intervention for patients with patellofemoral pain syndrome but, limited information is available in the effects of foot orthoses on knee pain and function of athletes with patellofemoral pain syndrome. The aim of our study was to determinate the effects of foot orthoses on reducing pain and increasing function of athletes with patellofemoral ...

متن کامل

Assessment of Improvement of Preventive Maintenance Systems Related to the Civil Projects Using Concepts of Value Engineering (RESEARCH NOTE)

The purpose of this paper is using the concepts of value engineering (VE) in evaluating the improvement caused by preventive maintenance (PM) systems in construction project. A real case is used to show how we can implement the proposed method. VE is the systematic application of recognized techniques by multi-disciplined teams that identifies the function of a product or service, establishes a...

متن کامل

Economic Statistical Design of Multivariate T^2 Control Chart with Variable Sample Sizes

Today, quality improvement and cost reduction are key factors for achieving business success, growth and position. One of the primary tools for quality improvement and cost reduction in online activities of statistical process control is control charts. As the need for monitoring several correlated quality characteristics is extensively growing, the use of multivariate control charts become...

متن کامل

Customer lifetime value model in an online toy store

Business all around the world uses different approaches to know their customers, segment them and formulate suitable strategies for them. One of these approaches is calculating the value of each customer for the company. In this paper by calculating Customer Lifetime Value (CLV) for individual customers of an online toy store named Alakdolak, three customer segments are extracted. The level of ...

متن کامل

Performance Improvement of Direct Torque Controlled Interior Permanent Magnet Synchronous Motor Drives Using Artificial Intelligence

The main theme of this paper is to present novel controller, which is a genetic based fuzzy Logic controller, for interior permanent magnet synchronous motor drives with direct torque control. A radial basis function network has been used for online tuning of the genetic based fuzzy logic controller. Initially different operating conditions are obtained based on motor dynamics incorporating...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013